为了在多个机器人系统中有效完成任务,必须解决的问题是同时定位和映射(SLAM)。激光雷达(光检测和范围)由于其出色的精度而用于许多SLAM解决方案,但其性能在无特征环境(如隧道或长走廊)中降低。集中式大满贯解决了云服务器的问题,云服务器需要大量的计算资源,并且缺乏针对中央节点故障的鲁棒性。为了解决这些问题,我们提出了一个分布式的SLAM解决方案,以使用超宽带(UWB)范围和探测测量值估算一组机器人的轨迹。所提出的方法在机器人团队之间分配了处理,并显着减轻了从集中式大满贯出现的计算问题。我们的解决方案通过最大程度地减少在机器人处于近距离接近时在不同位置进行的UWB范围测量方法来确定两个机器人之间的相对姿势(也称为环闭合)。 UWB在视线条件下提供了良好的距离度量,但是由于机器人的噪声和不可预测的路径,检索精确的姿势估计仍然是一个挑战。为了处理可疑的循环封闭,我们使用成对的一致性最大化(PCM)来检查循环封闭质量并执行异常拒绝。然后,在分布式姿势图优化(DPGO)模块中将过滤的环闭合与探光仪融合,以恢复机器人团队的完整轨迹。进行了广泛的实验以验证所提出的方法的有效性。
translated by 谷歌翻译
The use of multilingual language models for tasks in low and high-resource languages has been a success story in deep learning. In recent times, Arabic has been receiving widespread attention on account of its dialectal variance. While prior research studies have tried to adapt these multilingual models for dialectal variants of Arabic, it still remains a challenging problem owing to the lack of sufficient monolingual dialectal data and parallel translation data of such dialectal variants. It remains an open problem on whether the limited dialectical data can be used to improve the models trained in Arabic on its dialectal variants. First, we show that multilingual-BERT (mBERT) incrementally pretrained on Arabic monolingual data takes less training time and yields comparable accuracy when compared to our custom monolingual Arabic model and beat existing models (by an avg metric of +$6.41$). We then explore two continual pre-training methods-- (1) using small amounts of dialectical data for continual finetuning and (2) parallel Arabic to English data and a Translation Language Modeling loss function. We show that both approaches help improve performance on dialectal classification tasks ($+4.64$ avg. gain) when used on monolingual models.
translated by 谷歌翻译
我们为合作和异构多机构学习提供了多模式(视觉和语言)基准。我们介绍了一个基准的多模式数据集,其任务涉及在丰富的多房间环境中多个模拟异质机器人之间的协作。我们提供了一个集成的学习框架,最先进的多机构增强学习技术的多模式实现以及一致的评估协议。我们的实验研究了不同方式对多代理学习绩效的影响。我们还引入了代理之间的简单消息传递方法。结果表明,多模式为合作多学院学习带来了独特的挑战,并且在此类环境中推进多机构增强学习方法还有很大的空间。
translated by 谷歌翻译
Due to labor shortage and rising labor cost for the apple industry, there is an urgent need for the development of robotic systems to efficiently and autonomously harvest apples. In this paper, we present a system overview and algorithm design of our recently developed robotic apple harvester prototype. Our robotic system is enabled by the close integration of several core modules, including visual perception, planning, and control. This paper covers the main methods and advancements in deep learning-based multi-view fruit detection and localization, unified picking and dropping planning, and dexterous manipulation control. Indoor and field experiments were conducted to evaluate the performance of the developed system, which achieved an average picking rate of 3.6 seconds per apple. This is a significant improvement over other reported apple harvesting robots with a picking rate in the range of 7-10 seconds per apple. The current prototype shows promising performance towards further development of efficient and automated apple harvesting technology. Finally, limitations of the current system and future work are discussed.
translated by 谷歌翻译
语言指导的体现了AI基准,要求代理导航环境并操纵对象通常允许单向通信:人类用户向代理提供了自然语言命令,而代理只能被动地遵循命令。我们介绍了基于Alfred基准测试的基准测试后的拨号式拨号。Dialfred允许代理商积极向人类用户提出问题;代理使用用户响应中的其他信息来更好地完成其任务。我们发布了一个具有53K任务的问题和答案的人类注销数据集,以及一个可以回答问题的甲骨文。为了解决Dialfred,我们提出了一个提问者绩效框架,其中发问者通过人类通知的数据进行了预训练,并通过增强学习进行了微调。我们将拨号拨入公开,并鼓励研究人员提出和评估他们的解决方案,以构建支持对话的体现代理。
translated by 谷歌翻译
基于学习的培训方法的方法通常需要大量包含现实布局的高质量场景并支持有意义的互动。然而,用于体现AI(EAI)挑战的当前模拟器仅提供具有有限数量的布局的模拟室内场景。本文呈现出发光,第一研究框架采用最先进的室内场景综合算法,以在体现AI挑战的情况下生成大规模模拟场景。此外,我们通过支持复杂的家庭任务的能力自动和定量地评估生成的室内场景的质量。发光结合了一种新颖的场景生成算法(受限的随机现场生成(CSSG)),实现了具有人类设计的场景的竞争性能。在发光,EAI任务执行器,任务指令生成模块和视频呈现工具包中可以集体为实现的AI代理商的培训和评估集体为新场景产生大量多模式数据集。广泛的实验结果表明了发光产生的数据的有效性,使对泛化和鲁棒性的体现特性进行全面评估。
translated by 谷歌翻译
基于度量学习的最近方法取得了很大镜头学习的巨大进步。然而,大多数人都仅限于图像级表示方式,这不能正确地处理课外变化和空间知识,从而产生不希望的性能。在本文中,我们提出了一个深度偏置纠正网络(DBRN)来充分利用特征表示结构中存在的空间信息。我们首先采用偏置整流模块来缓解由类内变化引起的不利影响。偏置纠正模块能够专注于通过给定不同权重的对分类更具判别的特征。为了充分利用培训数据,我们设计了一种模拟增强机制,可以使从支架组产生的原型更具代表性。为了验证我们方法的有效性,我们对各种流行的几次分类基准进行了广泛的实验,我们的方法可以优于最先进的方法。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译